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Abstract 

The spatial propagation of many livestock infectious diseases critically depends on the an- 
imal movements among premises, so that the knowledge of movements data may help to 
detect, manage and control an outbreak. The identification of robust spreading features 
of the system is however hampered by the temporal dimension characterizing population 
interactions through movements. Traditional centrality measures do not provide relevant 
information as results strongly fluctuate in time and outbreak properties heavily depend on 
geotemporal initial conditions. By focusing on the case study of cattle displacements in 
Italy, we aim at characterizing livestock epidemics in terms of robust features useful for 
planning and control, to deal with temporal fluctuations, sensitivity to initial conditions, 
and missing information during an outbreak. Through spatial disease simulations, we de- 
tect spreading paths that are stable across different initial conditions, allowing the clustering 
of the seeds and reducing the epidemic variability. Paths also allow us to identify premises, 
called sentinels, having a large probability of being infected and providing critical infor- 
mation on the outbreak origin, as encoded in the clusters. This novel procedure provides 
a general framework that can be applied to specific diseases, for aiding risk assessment 
analysis and informing the design of optimal surveillance systems. 
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1 INTRODUCTION 

Livestock infectious diseases represent a major concern as they may compromise livestock wel- 
fare and reduce productivity, induce large costs for their control and eradication [IJ, and may in 
addition represent a threat to human health, since the emergence of human diseases is dominated 
by zoonotic pathogens [2J. Disease management and control are thus very important in order 
to reduce such risks and prevent large economical losses [[3l|4l[5l[6l|7]], and strongly depend on 
our ability to rapidly and accurately detect an outbreak and protect vulnerable elements of the 
system. The major difficulty lies in the assessment and prediction of the potential consequences 
of an outbreak, and how these depend on specific conditions of the epidemic event. Control 
may be hampered by the non-localized nature of disease transmission, with animal movements 
facilitating the geographical spread of the diseases on large spatial scales [[T]]. The knowledge 
of the pattern of movements among populations of hosts is thus crucial in that it represents 
the key driver of infection spread, defining the substrate along which transmission can occur. 
The availability of detailed datasets of animal movements allows for the explicit analysis of 
these patterns and the simulation of the spatial spreading of animal diseases among premises, 
aimed at the characterization of premises in terms of their risk of exposure or spreading poten- 
tial [l[9l[Iol[IIl[I2[I3[Il[l5l[M[nitt^ 

A network representation [[23l[24l[25l|26l is a natural description of the set of animal move- 
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ments with nodes corresponding to livestock-holding locations, and links referring to livestock 
movements. Network approaches to epidemic spreading are widely used, leading to valuable 
and important results in the understanding of the system's properties relevant to the disease 
spreading. Different centrality measures have been investigated in order to identify the nodes 
with largest spreading potential that should be targeted for disease control [|26l [27l |28l |29l [30l 
[3Tl[32l|, with a focus mainly on the static properties of the spatial and topological aspects of 
contact and movement patterns. The study of livestock trade movement data, however, has 
shown the presence of large heterogeneities characterizing the network from the geographical 
and temporal point of view [l8l[9l[ia[Il[I3[l5l[l9l|2a|3l|^ strong dynamical 

activity at the local level that limits the usefulness of projections into static properties [22J. 
The temporal nature of the pattern of livestock movements thus opens novel challenges limiting 
our understanding of the epidemic process because of (i) the strong dependence of the spread- 
ing pattern on the initial conditions, both geographical and temporal ifTTIl . and (ii) the lack of 
meaningful definitions of nodes' importance, given the observed large temporal fluctuations of 
centrality measures based on static structural properties ll22l . Both aspects limit our ability to 
design robust and efficient surveillance and containment measures by strongly increasing the 
number of degrees of freedom responsible for the outbreak outcomes. 

Here we address these challenges by considering the spread of livestock diseases on the 
dataset of cattle displacements among Italian animal holdings [[l5l|22]], where the full temporal 
resolution of the dataset is considered. In order to gain a general understanding of the interplay 
between the spreading dynamics and the temporal features of the animal movements, we con- 
sider a simple model of a notifiable highly contagious disease characterized by short timescales 
where the single epidemiological unit corresponds to the farm (i.e. the node of the network) 
and transmission can occur from farm to farm through animal movements (i.e. the links of the 
network) We propose a novel method, applied to the dataset under study, that uncovers the 
presence of similar spreading patterns allowing the clustering of initial conditions, thus reduc- 
ing the number of degrees of freedom, and the identification of sentinel nodes to be targeted for 
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disease surveillance. Appropriately parameterized applications can be considered for specific 
livestock diseases where movement-related transmission is a considerable risk factor. 

2 MATERIALS AND METHODS 
2.1 Dataset and network representation 

The data on cattle trade movements used in the present study is obtained from the Italian Na- 
tional Bovine database and provides a daily description of the movements of each bovine in 
Italy, specifying the premises of origin and destination and the date of the movement for each 
animal (identified through a unique ID) [15J. The dataset refers to the year 2007 and contains 
the movements of almost 5 million bovines between more than 170, 000 premises involving 
96% of the Italian municipalities (see Figure [T^) ifTSll . The dataset can be described through a 
dynamical network JUl [IH |22l |35| where the nodes correspond to premises and a directed link 
represents a displacement of bovines between two premises. 
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Figure 1: Properties of the cattle movement dataset. (a) Geographical representation of the 

total number of animals moved during the year 2007 for each municipality of the country. 

The color code is assigned according to the outgoing fluxes of displaced bovines. (b) Median 

(black) and 95% confidence intervals (grey) of outgoing traffic of each premises. For the sake 

of visualization the premises have been ranked by the median values. The traffic has been 

evaluated on daily (top) and monthly basis (bottom), (c) Two consecutive monthly networks 

(n = 3 and n = 4) have been considered. A list of premises with decreasing number of 

connections is calculated on the snapshot n = 3, and is applied as a removal strategy for both 

networks, i.e. from best connected premises to least connected ones, calculated on the snapshot 

n = 3 only. The relative size of the giant component GC (i.e. the largest fraction of premises 
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By aggregating all the displacements that take place within a given time interval [nAt, {n + 
l)At], it is possible to construct a series of temporally ordered static networks describing the 
movements at a temporal resolution At. The 365 daily networks (At = 1) correspond to the 
finest available temporal resolution, but other time scales (such as At = 7, At = 28, At = 365) 
may be used [lS[l0l[l2ll5lll9l[20l|3l. 

2.2 Epidemic simulations on the dynamical network of cattle movements 

The disease spread on the dynamical network is modeled using a simple SIR compartmental 
model [[36||. We assume that premises are the discrete single units of the process, neglecting 
the possible impact of within-farm dynamics, as commonly assumed in the study of the spread 
of highly contagious and rapid infectious diseases through animal movements [|6|]. Premises 
are labeled as Susceptible, Infectious, or Removed, according to the stage of the disease. All 
premises are considered susceptible at the beginning of the simulations, except for the single 
seeding farm. At each time step, an infectious farm i can transmit the disease along its outgoing 
links to its neighboring susceptible farms that become infected and can then propagate the 
disease further in the network. Here we consider a deterministic process for which the contagion 
occurs with probability equal to 1 as long as there is a directed link of cattle movements from 
an infectious farm to a susceptible one at a given time step [11 J. Though a crude assumption, 
this allows us to simplify the computational exploration of the initial conditions, focusing on 
the fastest infection patterns. The corresponding stochastic case is reported in the Electronic 
Supplementary Material (ESM), where both high and intermediate transmissibility rates are 
considered. After //"^ time steps an infected farm becomes recovered and cannot be reinfected. 
The simulation is fully defined by the choice of the timescale At, used to define the successive 
aggregated networks and of the initial conditions (xq, to) where xq is the seeding node and to 
indicates the outbreak start. 
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2.3 Invasion paths and seeds' cluster detection 

Given the limited applicability of quantities defined a priori to characterize the spreading poten- 
tial of a node in such a highly dynamical network, here we exhaustively explore the dependence 
of the spreading process on the initial conditions and investigate the possible emergence of re- 
current patterns, aiming at identifying similar spreaders in such a complex environment. The 
disease spreading pattern is encoded in an invasion path characterized by a set of nodes V, a 
set of directed links / (indicating the transmission), and a seed xq. We define the overlap B12 
between two paths Fi and r2 as the Jaccard index measuring the number of common 

nodes over the total number of nodes reached by the two paths. This measure does not consider 
the information on the links of transmission from one farm to another, as we are interested in 
the observable outcome of the outbreak, namely the fact that a farm is infected or not, rather 
than the precise transmission path. We also tested an alternative definition of the overlap taking 
into account the directed links composing the paths (see the ESM). 

We have computed, at fixed At = 1 day and initial time t^, the overlap ©12 between the 
invasion paths of deterministic SIR outbreaks generated by every pair of potential seeds {xi , X2) 
and constructed the initial conditions similarity network (ICSN) as a weighted undirected net- 
work in which each node is an initial condition of the epidemic and the link between two nodes 
xi and X2 is weighted by the value of the overlap B12, measuring the similarity of the inva- 
sion paths they produce. By filtering the ICSN to disregard values of the similarity smaller 
than a given threshold Qth^ subsets of nodes with similar spreading properties may emerge (see 
Figure [2]). 
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Figure 2: Schematic representation of the cluster detection procedure, (a) Different simulated 
invasion paths (colored lines) obtained for different seeder (corresponding colored nodes) are 
shown on the network, (b) The initial conditions similarity network (ICSN) is obtained by 
calculating, for any pair of initial conditions i and j, the overlap Qij measuring the similarity 
between the invasion paths originated by i and j. Thicker lines in the ICSN indicate a higher 
overlap, (c) By removing all links of the ICSN with an overlap lower than a given threshold 
Qth^ clusters of nodes leading to similar propagation paths emerge. 



The method described above unveils a partition 7^ (to) of the possible seeds that depends on 
the starting time to of the spreading. In order to measure the robustness of the clusters C^(to) 
at time t, we define the vector p^(t, to) with components Pij{t, to) = ^^^%^(^)\^^'^^ ^ representing 
the fraction of nodes of Q(to) present in the cluster Cj{t). If the partitions are equal at times to 
and t, each vector p^(t, to) will have one component equal to 1, and all the others equal to 0. If 
instead the nodes of C^(to) are homogeneously redistributed into the C clusters Cj{t) of V{t), 
Pi{t, to) will have all components equal to 1/C. Here we consider the C = 20 largest clusters 
for each t and measure the conditional entropy 

of observing a specific redistribution among the largest C clusters at time t, given that only a 
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fraction ai{t, to) = Pij{t^ of the original nodes are found within those clusters, rescaled 
by its maximum value. If Ci{to) is also a cluster of V{t), Hi{t) = 0. If its nodes are equally 
divided into the C clusters of V{t), the entropy is equal to 1. In general the entropy takes values 
in the interval [(1 — log (7/ log 1], where its minimum value, minij.(t), represents the best 
possible configuration; all the nodes of Ci{to) are in the same cluster of V{t), except the fraction 
(1 — cr^) that do not belong anymore to the largest C clusters. We explore in the ESM additional 
quantities to measure the stability of the partitions. 

2.4 Uncertainty in the identification of the seed cluster 

The presence of similar invasion paths may be exploited for the identification of the seed cluster 
starting from a specific infected premises. In order to investigate whether this is possible, we 
explore all paths of infections and measure the number of times that any node in the network is 
reached by the epidemic, breaking down this number according to the seed cluster originating 
the epidemic. We then associate to each node k, reached by the disease rik times, a vector 
7r{k) whose components TTj^k) represent the probability of being infected by a seeder belonging 
to the cluster j. If k is reached each of the times by invasion paths rooted in premises 
belonging to the same cluster m, the vector has components tt^ = 1 and TTj^rn = 0. On 
the contrary, for a node k infected by epidemics originated in farms belonging to a different 
cluster each of the rik times, the vector elements assume the values ttj = l/uk. In the case 
an epidemic is detected at node k by the surveillance system, the vector 7r{k) encodes valuable 
information restricting the possible set of initial conditions. In particular, it is possible to define 
an uncertainty ^ (k) in the identification of the seeding cluster, by using an entropy-like function 
defined as ^(fc) = — (log n^)"^ ttj log ttj. In the examples above, ^{k) = when k is always 
infected by the same cluster, and ^{k) = 1 if /c is infected each time by a different cluster. The 
normalization log{nk) is chosen because it represents the most homogeneous situation given 
that Uk is always smaller than the total number of clusters. An alternative normalization factor 
has also been tested in the ESM. 
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3 RESULTS 

3.1 Dynamical properties of cattle movement network 

The dynamical network of bovines displacements exhibits complex features both in the structure 
of the various static snapshots [HI [TOl [151 [13 [20l], and in the temporal fluctuations of links 
and nodes [t22l|. In particular, the links lifetime and the number of displaced bovines are not 
characterized by a well-defined time-scale, and the centrality measures commonly used in the 
context of static networks appear unable to identify the most important nodes of the network [[2T1 
[22l|37l. An aggregated view of the system over a temporal window At yields indeed a ranking 
of the importance of premises that may not reflect their properties at different moments of the 
system evolution, or at other aggregation timescales (see Figure [T]^) [l22l . 

Such fluctuations may strongly affect spreading processes, as premises that are poorly con- 
nected on a given day (or week/month) may become largely connected on the next day (re- 
spectively, week/month) and vice versa. Their impact on the efficacy of intervention measures 
is clearly important. Figure [T]: shows the efficacy in the reduction of the maximum possible 
epidemic size, indicated by the number of premises in the giant connected component (GC), 
when quarantine measures are adopted that are based on the movements knowledge at a given 
time only. More specifically, premises are removed from a network in decreasing order of the 
number of connections, however this information is measured only on the 3^^ month and ap- 
plied to the 3^^ and 4^^ monthly networks. While such a targeted removal is effective in rapidly 
decreasing the size of the largest connected component in the network of the 3^^ month, it is 
not at all effective for the network of the successive month. This highlights how using past in- 
formation might result in ineffective containment strategies through premises isolation for such 
a highly varying temporal network, and that the characterization of the spreading properties of 
premises cannot be assessed from a topological static point of view: the full dynamical nature 
of the trade system and of the epidemic propagating on it has to be taken into account. 
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3.2 Epidemic profiles and dependence on the initial conditions 

We first explore the role of the aggregation timescale At of the dynamical network on the dis- 
ease propagation, by analyzing the spreading patterns resulting from outbreaks starting at each 
xq of the ^ 1.7 • 10^ premises on the seeding date to = January P^ assuming an infectious pe- 
riod /x"^ = 7 days. The simulated epidemics dramatically depend on the aggregation timescale, 
as shown in Figure [3] for daily, weekly, monthly and yearly networks. 




time (days) time (days) 



Figure 3: Number of infected premises as a function of time for different aggregating time 
windows At. Each curve represents the profile of an epidemic starting on January from a 
given seed. 

The spreading becomes faster and reaches a larger proportion of the nodes ifTSll with in- 
creasing At, as expected since the temporal aggregation allows propagation paths that would 
otherwise be prevented by causality. Most importantly, the epidemic profiles show for short At 
an intrinsic variability as a function of the initial conditions of the outbreak, with multiple peaks 
and strong differences in peak times for different initial conditions. The aggregation on large 
At values leads to a loss of the network intrinsic variability and therefore to a smaller impact 
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of the seeds on the epidemic profiles. The large temporal fluctuations describing the premises' 
activity [l22l does not allow the identification of an upper bound of the timescale At that could 
be a good approximation to the system description, since any given infectious period /x"^ would 
have a non-negligible interplay with a broad set of timescales that are part of the full spectrum 
of timescales of the dynamical system. Therefore, in order to realistically account for the im- 
pact of the seeding on the spread of epidemics on the dynamical network, in the following we 
focus on the finest temporal scale. At = 1 day, for the description of the bovines mobility in 
the epidemic simulations. 

3.3 Similar spreaders and seeds' cluster emergence 

By fixing At = 1 day, we explored the results of the epidemic simulations starting from all 
possible geographical initial conditions corresponding to to = January We calculated the 
overlap values among all possible pairs of initial conditions and filtered the ICSN by applying 
a threshold value for the overlap equal to 0.8. The network separates into several connected 
components, leading to a natural emergence of clusters of initial conditions. These represent 
sets of nodes that, if at the origin of an outbreak, would lead to similar invasion paths. Clusters 
are organized in a hierarchy depending on the value of Qth^ and it is interesting to note that, 
given the distribution of similarity values obtained, even large enough values of Qth lead to the 
emergence of a number of non-trivial clusters of initial conditions, i.e. different from simply 
isolated nodes. The distribution of the sizes of the clusters is shown in the ESM, along with a 
sensitivity analysis on the value of Qth- 
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Figure 4: Topological representation of clusters and invasion paths corresponding to to = Jan- 
uary (a) The nodes belonging to each cluster are represented in the network of bovines 
displacement aggregated over the whole spreading period (grey network). The zoomed frame 
shows the absence of community-like structures or chain-like motifs, (b) Each network rep- 
resents the union of all invasion paths starting from the nodes of a given cluster. The initial 
conditions are not shown for the sake of visualization; the link thickness is proportional to the 
number of invasion paths propagating along that connection and the size of the nodes is propor- 
tional to the number of incoming invasion paths. Different topological structures of the invasion 
paths are found for different seed clusters. All nodes belonging to a given cluster are shown with 
the same color, and the same color is used in both panels for each cluster. 

In Figure |4] we show the 12 largest clusters along with the displacement network aggregated 
over the entire spreading period. Some important characteristics of the clusters emerge clearly. 
First, the nodes of a given cluster are not tightly connected in the aggregated displacement 
network. In addition, there is a lack of chains of infections: the nodes in the clusters are not 
trivially connected to each other by links that bring the disease from one node to the next. A 
direct analysis of the aggregated displacement network, based for instance on the search of 

13 



Optimizing surveillance for livestock disease spreading through animal movements 



communities or chain-like structures, would therefore not be able to detect the similarity of 
their spreading properties, even if performed using different aggregation timescales At. 

The spatial analysis of the georeferenced representation of the clusters (where each node is 
assigned the location of the corresponding municipality) shows moreover that, although some 
clusters are formed by nodes which are geographically rather close, most clusters are dispersed, 
with a distribution of distances between nodes spanning several hundreds of kilometers (Fig- 
ure [5]). Clusters can also geographically overlap and do not have mutually separated geographi- 
cal boundaries. Therefore, the geographical proximity of two nodes does not necessarily imply 
that they will lead to similar invasion paths. 

Overall, neither the structural nor the geographical analysis of the dynamical network of 
displacements would be able to reveal the existence and composition of groups of nodes lead- 
ing to similar spreading patterns, and a detailed analysis of the dynamical process is needed. 
Interestingly, the mixed shapes observed in the profiles of Figure [3] are automatically classified 
into a set of specific and well-defined behaviors by considering initial conditions belonging to 
the same cluster, as shown in Figure |6^. Grounded in the comparison of the infected nodes 
and disregarding the explicit links of transmission, the clustering method is able to group the 
spreading histories into similar patterns characterized by the same timing and size. An alter- 
native version of the clustering method based on the overlap of the full invasion paths leads to 
a similar partition, despite the fact it relies on a much larger amount of information (see the 
ESM). Similar findings are also obtained with a stochastic infection dynamics, as reported in 
the ESM. 
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Figure 5: Geographical characterizations of clusters corresponding to to = January (a) As 
a paradigmatic example, 4 clusters are shown in different colors on the georeferenced network 
of bovines displacement aggregated over the whole spreading period (35 days). The same color 
code of Figure |4] is used. The ellipses highlight the most compact clusters in terms of geograph- 
ical dispersion. In the same area different clusters may coexist, moreover, some of them can be 
rather dispersed, as shown by the clusters colored in red and in blue, (b) Cluster geographical 
dispersion, calculated as the distance between each pair of nodes belonging to the same cluster 
(identified by the color). 
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Figure 6: Seeds' clusters characterization, (a) Number of infected farms as a function of time. 
Each panel reports the profiles of the epidemics starting from initial conditions belonging to 
the same cluster. Four clusters of different sizes are shown as examples, (b) Entropy H of the 
partition into clusters at to = January as a function of time t ^t^ + lw with t(; = l,2,3,..., 
for the same clusters as in (a). The difference H — min^f (grey bars) represents the robustness 
of the cluster (the smaller the difference and the more robust is the cluster), given that only part 
of the nodes may survive in the partition at time t. Four typical behaviors can be characterized, 
as explained in the main text, each reported by an example in the figure. 
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3.4 Longitudinal stability of the seeds' clusters 

Given the strong variability of the network's properties on all timescales [^25. partitions ob- 
tained for spreading processes starting at different times could substantially differ. In order to 
investigate this aspect, we compare the partition obtained at time to with the one obtained at 
time t > to by means of the entropy function H defined in the Materials and Methods section. 
H measures the level of fragmentation of the cluster partition in time, with small values indi- 
cating a large stability and values close to 1 indicating the disruption of the original partition. 
The lower bound (min^f ) represents the most stable configuration and takes into account the 
possible disappearance of nodes from the partition at time t. We present in Figure[6}3 the results 
corresponding to to = January and t = to + 7w with w = 1,2, 3, i.e. successive times 
separated by w weeks from to- The cluster temporal stability can roughly be classified into 
four main behaviors, shown through four examples: /) a substantial fraction of the nodes of the 
cluster disappears already foYw = l (mmn ^ 0), and small groups of nodes are redistributed 
in other clusters (small H — min^f ), quite stable in time (cluster 0); //) high stability aiw = 1, 
followed by a similar behavior (cluster 1); ///) high stability ai w = 1, followed by a robust 
preservation of the partition for several weeks (cluster 9); iv) a very unstable behavior, as the 
cluster's nodes disappear almost completely from the partition at time t (very high min^f , clus- 
ter 8). The most robust behavior in time (shown by the example of cluster 9), was found for 2 
clusters out of the 20 largest clusters considered for to = January Interestingly, it turns out 
that the size of a cluster is not correlated with its stability, as shown in the ESM where the sta- 
bility of all clusters is investigated, along with additional measures of stability and a sensitivity 
on the C values considered. 

3.5 Disease sentinels 

The success of control and mitigation measures critically depends on the ability to rapidly detect 
an outbreak and identify its source. Ideally, a timely detection of the origin of the disease would 
allow a targeted strategy to isolate the infected premises and contain the propagation further. 
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Longer delays between the start of the outbreak and its detection mean larger numbers of in- 
fected farms, a more difficult identification of the starting point of the spreading, and therefore 
of the propagation paths, overall leading to increasing difficulties in preventing further spread 
and to increasingly expensive containment measures. The high temporal variability and the 
complex nature of the network of displacements makes the identification of the possible origin 
of the outbreak, following the detection of an infected node, a particularly difficult task. This 
has to be factored in with partial or missing knowledge on the epidemic situation due to under- 
reporting and/or the presence of a silent spread phase that would delay the first detection of the 
outbreak while propagation occurs. The heterogeneous nature of the network allows however 
the identification of clusters of seeds leading to similar invasion paths, that may be used to en- 
hance surveillance and help the inference of the origin of a disease, once an epidemic unfolds 
on the network. 

Based on the cluster partition V{to) obtained from the epidemic simulations starting at time 
to from all possible initial conditions, we calculate the uncertainty ^ of all premises infected 
by the epidemic in identifying the seed cluster originating the outbreak. Figure [7^ shows the 
cumulative distribution of the uncertainty ^. The number of times Uk that a holding is infected 
may strongly vary from one holding to the next; in particular, many nodes are in fact infected 
just once (uk = 1), yielding trivially high ^{k) values. We thus focus on premises that have 
been infected at least 10 times. Interestingly, even with this restriction, the seeder uncertainty 
is less than 40% for almost 70% of the infected nodes, meaning that most premises reached by 
the infection are able to provide valuable insights about the origin of the disease in terms of the 
identification of the cluster from which the spreading originated. As a result, information about 
the invasion paths and the epidemic timing is also obtained, following the findings of Figure |6^. 
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Figure 7: Sentinel premises, (a) Cumulative probability distribution of the uncertainty of a 
given premises in the identification of the seeding cluster. Slaughterhouses are discarded from 
the analysis, as they cannot spread the disease further to other farms, (b) For a set of initial 
conditions (xq, to), with to = January P^ each infected farm is represented by a dot in the n — 
phase space, with n being the number of times the farm is reached by an infection. Sentinel 
nodes are defined as the farms that are often reached by epidemics (i.e. n > Ug) and have a 
low degree of uncertainty in the identification of the seeding cluster that led to the outbreak (i.e. 
^ < ^s)' The plot shows the trajectories in then — ^ phase space of the 15 sentinels obtained by 
imposing rig = 30 and = 0.4, for eight consecutive weeks starting from January 



The uncertainty ^{k) on the identification of the cluster of initial conditions infecting the 
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node k and the number of times Uk the node k is reached by the epidemic clearly depend on 
the time to of the start of the epidemic. In the following, we explore the variation of these 
two quantities for all nodes of the network when we consider epidemics starting at time to = 
January +7w with w = 0, 1, 2, 3, 8, i.e., spanning an 8-weeks interval from January 
In Figure|7|3 we represent each farm A; as a point with coordinates {nk,C{k)) in the n — ^ 
phase space, for to= January As to changes, a variety of different behaviors is obtained, as 
expected given the large variability of the network. Large fluctuations of the number of times 
a node is infected are observed, as a node with a large rik (i.e., often reached by the disease) 
for an initial time to n^^y be rarely reached if the outbreak starts later, given the change in the 
network of displacements, or may even disappear from the plot if it is not infected for a given 
explored initial time (i.e., it has Uk = 0). Similarly, also the values of the uncertainty in the 
identification of the seeding cluster can strongly fluctuate. From the surveillance perspective, 
we are interested in the nodes that are infected a large number of times (i.e., are likely reached by 
the epidemic, given any temporal and geographical initial conditions) and for which we have a 
low uncertainty in the identification of the seeding cluster, providing important insights into the 
previous and future spreading patterns. We define these premises as sentinel nodes by imposing 
that they are infected at least rig times and are characterized by an uncertainty at most equal to 
for all initial conditions. Their trajectories in then — ^ phase space for varying to are shown in 
Figure|7]3 for Ug = 30 and = 0.4. The choice of the (n^, ^g) threshold values depends on the 
resources available to monitor these sentinels: smaller Ug and larger ^g lead to a larger number 
of sentinels. In Table 1 of the ESM, we report the number of sentinels for different (rig.^g) 
values. It is also possible to be less conservative and enlarge the group of possible sentinels for 
an efficient detection of an infectious disease by including farms with discontinuous trajectories 
that may have Uk = for one value of the starting time but have Uk > rig and < for the 
other starting times. By relaxing these constraints, it is possible to build a hierarchy of disease 
sentinels with different levels of reliability, and specific to the available surveillance resources. 
In the ESM we also tested an alternative definition of the entropy function showing that it does 



20 



Optimizing surveillance for livestock disease spreading through animal movements 



not alter the results. 

The interest of the definition of sentinel nodes in the perspective of a surveillance system 
is quantified further in Figure [8^. Given a set of sentinels, we measure the fraction of detected 
outbreaks as a function of the outbreak final size, where an outbreak is considered detected if 
it infects at least a sentinel farm. Figure [8^ shows that sentinels are not good indicators for 
the presence of small outbreaks (i.e., corresponding to sizes smaller than 5-10 infected farms), 
however a surveillance system based on only 15 sentinel nodes (out of a total number of more 
than 170, 000 premises) would detect more than 55% of the outbreaks with final size larger than 
10 and, if the number of sentinels is increased to 32, the fraction of outbreaks detected would 
be more than 75%. Finally, it is also important to consider that the information provided by 
the sentinel farms is meaningful as long as the detection occurs rather early in the outbreak 
evolution. We evaluated the rapidity of the detection by plotting the infection time of each of 
the 15 sentinel farms (obtained with = 30 and ^5 = 0.4) relative to the full outbreak duration, 
for outbreaks with size larger than 10 (Figure [8J3). Interestingly, almost all sentinels are able 
to detect most outbreaks within the first third of the outbreak duration. Similar results are also 
valid for the stochastic case, as reported in the ESM. 

It is finally interesting to note how sentinel nodes cannot be identified through geographical, 
topological or fluxes analyses only. Figure 8c-d shows the properties of the sentinel nodes in 
terms of the number of in- and out-connections, and of the number of batches moved in and 
out of the premises, highlighting how sentinels do not share similar properties and span largely 
fluctuating values in the parameter space. 

4 DISCUSSION 

The full knowledge of the livestock movements at a daily resolution makes it possible to investi- 
gate in detail the spreading patterns of livestock emerging diseases. Through simulations on the 
fully dynamic network, where daily bovine movements are explicitly captured, we have studied 
the role of the initial conditions in shaping the propagation process. Clusters of seeds emerge 
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Figure 8: Properties of the surveillance system based on sentinel premises, (a) Fraction of 
outbreaks detected by the sentinels as a function of the minimum outbreak size of the epidemic, 
for two sets of sentinels (of 15 and 32 sentinels), corresponding to (Ug = 30, = 0.4) and 
(ris = 30, = 0.45), respectively, (b) Boxplot of the time of infection of the 15 sentinels 
relative to the full duration of the outbreak, considering the detected outbreaks with final size 
larger than 10. Each box is colored according to the number of times n that the sentinel has been 
infected and a grey shaded area indicates 33% of the relative infection time, (c) Topological 
properties of sentinel nodes (red dots), compared to the other nodes (smaller black dots). All 
premises in the system (except slaughterhouses) are represented in the plane of the number of 
in-connection vs. the number of out-connections per premises. Sentinels may be characterized 
by either small or large number of in/out corrections, (d) As in (c), but showing the fluxes 
properties in the plane of the number of batches moved in and out of each premises. Even in 
this case, sentinels may assume small to large values in the parameter space. 
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that lead to similar spreading patterns in terms of infected premises, and are also characterized 
by similar epidemic profiles and peak times. These clusters cannot be identified from purely 
structural or geographical considerations. The proposed clustering method can be used in order 
to optimize surveillance systems and define rapid and efficient containment strategies, target- 
ing farms that are at high risk of being infected and further spread the disease. Although the 
displacement network is characterized by a large temporal variability, intrinsically altering the 
centrality role of nodes from a given observation time to another, it is possible to identify sen- 
tinel nodes representing premises which are often reached by the disease and, when detected as 
infected, are able to provide valuable information on the seeding farms of the outbreak and thus 
on the likely spreading path, allowing to design targeted intervention strategies. A hierarchical 
classification of sentinels can be provided by tuning the constraints imposed for their definition, 
leading to different levels of surveillance. Remarkably, the bare knowledge of the animal move- 
ments would not be enough to estimate the origin of a disease, once detected, as the outbreak 
results from the complex interplay of the dynamical network and the disease dynamics. On 
the other hand, this interplay leads to the emergence of a very small number of sentinels, with 
respect to the total number of premises present in the system, that may be efficiently used for 
disease prevention and control. 

Applications to specific diseases, where the timescale of the epidemic is set by the parame- 
ters describing the disease etiology, can be performed to tune this framework to particular cases. 
These findings clearly depend on the full knowledge of the displacement dataset, and can thus 
be obtained as a priori information during a non-emergency period to help orienting control 
strategies, as commonly done with the static analysis of the contact network structure, strength- 
ening the importance of such data collection. The ability to make useful predictions for current 
and future livestock movements patterns depend on the level of similarity across different years 
of data. The analysis of successive years of movements data, uncovering possible recurrent 
patterns and seasonal behaviors, may thus contribute to make this framework a general tool to 
be used in real-time emergencies. 
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